Automatically Creating Datasets For Measures Of Semantic Relatedness

نویسندگان

Torsten Zesch

Iryna Gurevych

چکیده

Semantic relatedness is a special form of linguistic distance between words. Evaluating semantic relatedness measures is usually performed by comparison with human judgments. Previous test datasets had been created analytically and were limited in size. We propose a corpus-based system for automatically creating test datasets.1 Experiments with human subjects show that the resulting datasets cover all degrees of relatedness. As a result of the corpus-based approach, test datasets cover all types of lexical-semantic relations and contain domain-specific words naturally occurring in texts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

متن کامل

Evaluating Emergent Semantics in Folksonomies on Human Intuition

Semantic relations that closely resemble the human intuition of semantic relatedness, have been extracted automatically from sources, like text corpora, Wikipedia, or folksonomies, e.g., for constructing ontologies or for enhancing website navigation. Thereby, folksonomies are especially interesting since often, rich semantic structures emerge from the annotation of resources through users. In ...

متن کامل

Comparing Wikipedia and German Wordnet by Evaluating Semantic Relatedness on Multiple Datasets

We evaluate semantic relatedness measures on different German datasets showing that their performance depends on: (i) the definition of relatedness that was underlying the construction of the evaluation dataset, and (ii) the knowledge source used for computing semantic relatedness. We analyze how the underlying knowledge source influences the performance of a measure. Finally, we investigate th...

متن کامل

A Multimodal Vocabulary for Augmentative and Alternative Communication from Sound/Image Label Datasets

Existing Augmentative and Alternative Communication vocabularies assign multimodal stimuli to words with multiple meanings. The ambiguity hampers the vocabulary effectiveness when used by people with language disabilities. For example, the noun “a missing letter” may refer to a character or a written message, and each corresponds to a different picture. A vocabulary with images and sounds unamb...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Automatically Creating Datasets For Measures Of Semantic Relatedness

نویسندگان

چکیده

منابع مشابه

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Evaluating Emergent Semantics in Folksonomies on Human Intuition

Comparing Wikipedia and German Wordnet by Evaluating Semantic Relatedness on Multiple Datasets

A Multimodal Vocabulary for Augmentative and Alternative Communication from Sound/Image Label Datasets

عنوان ژورنال:

اشتراک گذاری